An Abstraction Layer for SIMD Extensions
نویسندگان
چکیده
This paper presents an abstraction layer for short vector SIMD ISA extensions like Intel’s SSE, AMD’s 3DNow!, Motorola’s AltiVec, and IBM’s Double Hummer. It provides unified access to short vector instructions via intermediate level building blocks. These primitives are C macros that allow, for instance, portable and highly efficient implementations of discrete linear transforms like FFTs and DCTs. The newly developed API is built on top of recently introduced C language extensions that allow to access short vector SIMD hardware from within C programs by means of data types and intrinsic or built-in functions. Empirical evidence of the success of the portable SIMD API is provided by means of program runs of short vector versions of Spiral and Fftw, resulting in the fastest FFT implementations to date.
منابع مشابه
2-D Wavelet Transform Enhancement on General- Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation1
This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipel...
متن کاملInstruction Set Architecture Abstraction
This technical report describes CHERI ISAv3, the third version of the This report describes the CHERI Instruction-Set Architecture (ISA) and design. The purpose of this tutorial was to introduce the computer architecture Pydgin is a framework for rapidly developing instruction-set simulators (ISSs) from a but is particularly well-suited for exploring the hardware/software abstraction. The Intel...
متن کاملSIMD code generation in data-parallel programming
Today’s desktop PCs feature a variety of parallel processing units. Developing applications that exploit this parallelism is a demanding task, and a programmer has to obtain detailed knowledge about the hardware for efficient implementation. CGiS is a data-parallel programming language providing a unified abstraction for two parallel processing units: graphics processing units (GPUs) and the ve...
متن کاملHardware Abstraction Layer (HAL)
ION LAYER FOR IMPLEMENTATION OF EXTENSIONS IN PROGRAMMABLE NETWORKS Hardware Abstraction Layer (HAL)
متن کاملChemical Kinetics on Multi-core SIMD Architectures
Chemical kinetics modeling accounts for a significant portion of the computational time of atmospheric models. Effective application of multiple levels of heterogeneous parallelism can significantly reduce computational time, but implementation on emerging multi-core technologies can be prohibitively difficult. We introduce an approach for chemical kinetics modeling on multi-core SIMD architect...
متن کامل